76 research outputs found

    Cross-Entropy Clustering

    Full text link
    We construct a cross-entropy clustering (CEC) theory which finds the optimal number of clusters by automatically removing groups which carry no information. Moreover, our theory gives simple and efficient criterion to verify cluster validity. Although CEC can be build on an arbitrary family of densities, in the most important case of Gaussian CEC: {\em -- the division into clusters is affine invariant; -- the clustering will have the tendency to divide the data into ellipsoid-type shapes; -- the approach is computationally efficient as we can apply Hartigan approach.} We study also with particular attention clustering based on the Spherical Gaussian densities and that of Gaussian densities with covariance s \I. In the letter case we show that with ss converging to zero we obtain the classical k-means clustering

    Uniform Cross-entropy Clustering

    Get PDF
    Robust mixture models approaches, which use non-normal distributions have recently been upgraded to accommodate data with fixed bounds. In this article we propose a new method based on uniform distributions and Cross-Entropy Clustering (CEC). We combine a simple density model with a clustering method which allows to treat groups separately and estimate parameters in each cluster individually. Consequently, we introduce an effective clustering algorithm which deals with non-normal data

    Cross-entropy based image thresholding

    Get PDF
    This paper presents a novel global thresholding algorithm for the binarization of documents and gray-scale images using Cross Entropy Clustering. In the first step, a gray-level histogram is constructed, and the Gaussian densities are fitted. The thresholds are then determined as the cross-points of the Gaussian densities. This approach automatically detects the number of components (the upper limit of Gaussian densities is required)

    Subspace memory clustering

    Get PDF
    We present a new subspace clustering method called SuMC (Subspace Memory Clustering), which allows to efficiently divide a dataset D c RN into k 2 N pairwise disjoint clusters of possibly different dimensions. Since our approach is based on the memory compression, we do not need to explicitly specify dimensions of groups: in fact we only need to specify the mean number of scalars which is used to describe a data-point. In the case of one cluster our method reduces to a classical Karhunen-Loeve (PCA) transform. We test our method on some typical data from UCI repository and on data coming from real-life experiments

    SVM with a neutral class

    Get PDF
    In many real binary classification problems, in addition to the presence of positive and negative classes, we are also given the examples of third neutral class, i.e., the examples with uncertain or intermediate state between positive and negative. Although it is a common practice to ignore the neutral class in a learning process, its appropriate use can lead to the improvement in classification accuracy. In this paper, to include neutral examples in a training stage, we adapt two variants of Tri-Class SVM (proposed by Angulo et al. in Neural Process Lett 23(1):89–101, 2006), the method designed to solve three-class problems with a use of single learning model. In analogy to classical SVM, we look for such a hyperplane, which maximizes the margin between positive and negative instances and which is localized as close to the neutral class as possible. In addition to original Angulo’s paper, we give a new interpretation of the model and show that it can be easily implemented in the primal. Our experiments demonstrate that considered methods obtain better results in binary classification problems than classical SVM and semi-supervised SVM

    Online updating of active function cross-entropy clustering

    Get PDF
    Gaussian mixture models have many applications in density estimation and data clustering. However, the model does not adapt well to curved and strongly nonlinear data, since many Gaussian components are typically needed to appropriately fit the data that lie around the nonlinear manifold. To solve this problem, the active function cross-entropy clustering (afCEC) method was constructed. In this article, we present an online afCEC algorithm. Thanks to this modification, we obtain a method which is able to remove unnecessary clusters very fast and, consequently, we obtain lower computational complexity. Moreover, we obtain a better minimum (with a lower value of the cost function). The modification allows to process data streams

    Set Aggregation Network as a Trainable Pooling Layer

    Full text link
    Global pooling, such as max- or sum-pooling, is one of the key ingredients in deep neural networks used for processing images, texts, graphs and other types of structured data. Based on the recent DeepSets architecture proposed by Zaheer et al. (NIPS 2017), we introduce a Set Aggregation Network (SAN) as an alternative global pooling layer. In contrast to typical pooling operators, SAN allows to embed a given set of features to a vector representation of arbitrary size. We show that by adjusting the size of embedding, SAN is capable of preserving the whole information from the input. In experiments, we demonstrate that replacing global pooling layer by SAN leads to the improvement of classification accuracy. Moreover, it is less prone to overfitting and can be used as a regularizer.Comment: ICONIP 201
    corecore